Query Term Selection Strategies for Web-based Chinese Factoid Question Answering

نویسندگان

  • Hao Tang
  • Cheng-Wei Lee
  • Tian-Jian Jiang
  • Wen-Lian Hsu
چکیده

Passage retrieval plays an important role in a Chinese factoid Question Answering (QA) system. Query term selection is the process of choosing keywords from a given question to make the most use of information retrieval engines. Query terms selected by humans are analyzed to measure the difficulty and for evaluating machine generated results. Three approaches, namely stop words elimination, rule-based, and machine learning-based, are studied in this paper. Eliminating stop words is the simplest one. Heuristic rules produced by morphologists are more complex. Conditional Random Fields (CRF), a machine learning approach, is adopted for labeling query terms. For evaluation, two sets of metrics are proposed. Passage MRR/Coverage relies on search engine result which directly relates to the QA performance but is time consuming and may vary at different time. Our experiment shows that Query Term Precision/Recall is a viable alternative. The baseline Coverage of sending raw questions to Google is about 53%, while applying the three approaches yields 65% for stop words elimination, 57% for rule-based approach, and 54% for machine learning-based approach. The MRR of sending raw questions to Google is 0.33, while applying the three approaches yields 0.44 for stop words elimination, 0.41 for rule-based approach and 0.38 for machine learning-based approach. The result can be not only for factoid QA systems but also a preprocessor for search engines.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Boosting Passage Retrieval through Reuse in Question Answering

Question Answering (QA) is an emerging important field in Information Retrieval. In a QA system the archive of previous questions asked from the system makes a collection full of useful factual nuggets. This paper makes an initial attempt to investigate the reuse of facts contained in the archive of previous questions to help and gain performance in answering future related factoid questions. I...

متن کامل

Monolingual Web-Based Factoid Question Answering In Chinese,Swedish, English And Japanese

In this paper we extend the application of our statistical pattern classification approach to question answering (QA) which has previously been applied successfully to English and Japanese to develop two prototype QA systems in Chinese and Swedish. We show what data is necessary to achieve this and also evaluate the performance of the two new systems using a translation of the TREC 2003 factoid...

متن کامل

Open-Domain Non-factoid Question Answering

We present an end-to-end system for open-domain non-factoid question answering. We leverage the information on the ever-growing World Wide Web, and the capabilities of modern search engines to find the relevant information. Our QA system is composed of three components: (i) query formulation module (QFM) (ii) candidate answer generation module (CAGM) and (iii) answer selection module (ASM). A t...

متن کامل

Component Analysis of a Chinese Factoid Question-Answering System

An analysis is provided for three major components of a simple Chinese Question-Answering system: passage retrieval, entity extraction and candidate selection. The order of least effective component is determined to be: answer selection, retrieval and extraction. In crosslingual QA, deficiencies in question translation not only lead to retrieval loss, but may also have adverse effects at answer...

متن کامل

Answering the Hard Questions

We present an end-to-end system for open-domain non-factoid question-answering. To accomplish this we leverage the information on the ever-growing World Wide Web, and the capabilities of commercial search engines to find the relevant information. Our QA system is composed of three components: (i) query formulation module (QFM) (ii) candidate answer generation module (CAGM) and (iii) answer sele...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006